59 research outputs found
Predicting protein-protein binding sites in membrane proteins
<p>Abstract</p> <p>Background</p> <p>Many integral membrane proteins, like their non-membrane counterparts, form either transient or permanent multi-subunit complexes in order to carry out their biochemical function. Computational methods that provide structural details of these interactions are needed since, despite their importance, relatively few structures of membrane protein complexes are available.</p> <p>Results</p> <p>We present a method for predicting which residues are in protein-protein binding sites within the transmembrane regions of membrane proteins. The method uses a Random Forest classifier trained on residue type distributions and evolutionary conservation for individual surface residues, followed by spatial averaging of the residue scores. The prediction accuracy achieved for membrane proteins is comparable to that for non-membrane proteins. Also, like previous results for non-membrane proteins, the accuracy is significantly higher for residues distant from the binding site boundary. Furthermore, a predictor trained on non-membrane proteins was found to yield poor accuracy on membrane proteins, as expected from the different distribution of surface residue types between the two classes of proteins. Thus, although the same procedure can be used to predict binding sites in membrane and non-membrane proteins, separate predictors trained on each class of proteins are required. Finally, the contribution of each residue property to the overall prediction accuracy is analyzed and prediction examples are discussed.</p> <p>Conclusion</p> <p>Given a membrane protein structure and a multiple alignment of related sequences, the presented method gives a prioritized list of which surface residues participate in intramembrane protein-protein interactions. The method has potential applications in guiding the experimental verification of membrane protein interactions, structure-based drug discovery, and also in constraining the search space for computational methods, such as protein docking or threading, that predict membrane protein complex structures.</p
Towards Universal Structure-Based Prediction of Class II MHC Epitopes for Diverse Allotypes
The binding of peptide fragments of antigens to class II MHC proteins is a crucial step in initiating a helper T cell immune response. The discovery of these peptide epitopes is important for understanding the normal immune response and its misregulation in autoimmunity and allergies and also for vaccine design. In spite of their biomedical importance, the high diversity of class II MHC proteins combined with the large number of possible peptide sequences make comprehensive experimental determination of epitopes for all MHC allotypes infeasible. Computational methods can address this need by predicting epitopes for a particular MHC allotype. We present a structure-based method for predicting class II epitopes that combines molecular mechanics docking of a fully flexible peptide into the MHC binding cleft followed by binding affinity prediction using a machine learning classifier trained on interaction energy components calculated from the docking solution. Although the primary advantage of structure-based prediction methods over the commonly employed sequence-based methods is their applicability to essentially any MHC allotype, this has not yet been convincingly demonstrated. In order to test the transferability of the prediction method to different MHC proteins, we trained the scoring method on binding data for DRB1*0101 and used it to make predictions for multiple MHC allotypes with distinct peptide binding specificities including representatives from the other human class II MHC loci, HLA-DP and HLA-DQ, as well as for two murine allotypes. The results showed that the prediction method was able to achieve significant discrimination between epitope and non-epitope peptides for all MHC allotypes examined, based on AUC values in the range 0.632–0.821. We also discuss how accounting for peptide binding in multiple registers to class II MHC largely explains the systematically worse performance of prediction methods for class II MHC compared with those for class I MHC based on quantitative prediction performance estimates for peptide binding to class II MHC in a fixed register
Learning a peptide-protein binding affinity predictor with kernel ridge regression
We propose a specialized string kernel for small bio-molecules, peptides and
pseudo-sequences of binding interfaces. The kernel incorporates
physico-chemical properties of amino acids and elegantly generalize eight
kernels, such as the Oligo, the Weighted Degree, the Blended Spectrum, and the
Radial Basis Function. We provide a low complexity dynamic programming
algorithm for the exact computation of the kernel and a linear time algorithm
for it's approximation. Combined with kernel ridge regression and SupCK, a
novel binding pocket kernel, the proposed kernel yields biologically relevant
and good prediction accuracy on the PepX database. For the first time, a
machine learning predictor is capable of accurately predicting the binding
affinity of any peptide to any protein. The method was also applied to both
single-target and pan-specific Major Histocompatibility Complex class II
benchmark datasets and three Quantitative Structure Affinity Model benchmark
datasets.
On all benchmarks, our method significantly (p-value < 0.057) outperforms the
current state-of-the-art methods at predicting peptide-protein binding
affinities. The proposed approach is flexible and can be applied to predict any
quantitative biological activity. The method should be of value to a large
segment of the research community with the potential to accelerate
peptide-based drug and vaccine development.Comment: 22 pages, 4 figures, 5 table
Predicting Peptide Binding Affinities to MHC Molecules Using a Modified Semi-Empirical Scoring Function
The Major Histocompatibility Complex (MHC) plays an important role in the human immune system. The MHC is involved in the antigen presentation system assisting T cells to identify foreign or pathogenic proteins. However, an MHC molecule binding a self-peptide may incorrectly trigger an immune response and cause an autoimmune disease, such as multiple sclerosis. Understanding the molecular mechanism of this process will greatly assist in determining the aetiology of various diseases and in the design of effective drugs. In the present study, we have used the Fresno semi-empirical scoring function and modify the approach to the prediction of peptide-MHC binding by using open-source and public domain software. We apply the method to HLA class II alleles DR15, DR1, and DR4, and the HLA class I allele HLA A2. Our analysis shows that using a large set of binding data and multiple crystal structures improves the predictive capability of the method. The performance of the method is also shown to be correlated to the structural similarity of the crystal structures used. We have exposed some of the obstacles faced by structure-based prediction methods and proposed possible solutions to those obstacles. It is envisaged that these obstacles need to be addressed before the performance of structure-based methods can be on par with the sequence-based methods
Exploiting residue-level and profile-level interface propensities for usage in binding sites prediction of proteins
<p>Abstract</p> <p>Background</p> <p>Recognition of binding sites in proteins is a direct computational approach to the characterization of proteins in terms of biological and biochemical function. Residue preferences have been widely used in many studies but the results are often not satisfactory. Although different amino acid compositions among the interaction sites of different complexes have been observed, such differences have not been integrated into the prediction process. Furthermore, the evolution information has not been exploited to achieve a more powerful propensity.</p> <p>Result</p> <p>In this study, the residue interface propensities of four kinds of complexes (homo-permanent complexes, homo-transient complexes, hetero-permanent complexes and hetero-transient complexes) are investigated. These propensities, combined with sequence profiles and accessible surface areas, are inputted to the support vector machine for the prediction of protein binding sites. Such propensities are further improved by taking evolutional information into consideration, which results in a class of novel propensities at the profile level, i.e. the binary profiles interface propensities. Experiment is performed on the 1139 non-redundant protein chains. Although different residue interface propensities among different complexes are observed, the improvement of the classifier with residue interface propensities can be negligible in comparison with that without propensities. The binary profile interface propensities can significantly improve the performance of binding sites prediction by about ten percent in term of both precision and recall.</p> <p>Conclusion</p> <p>Although there are minor differences among the four kinds of complexes, the residue interface propensities cannot provide efficient discrimination for the complicated interfaces of proteins. The binary profile interface propensities can significantly improve the performance of binding sites prediction of protein, which indicates that the propensities at the profile level are more accurate than those at the residue level.</p
Interaction site prediction by structural similarity to neighboring clusters in protein-protein interaction networks
<p>Abstract</p> <p>Background</p> <p>Recently, revealing the function of proteins with protein-protein interaction (PPI) networks is regarded as one of important issues in bioinformatics. With the development of experimental methods such as the yeast two-hybrid method, the data of protein interaction have been increasing extremely. Many databases dealing with these data comprehensively have been constructed and applied to analyzing PPI networks. However, few research on prediction interaction sites using both PPI networks and the 3D protein structures complementarily has explored.</p> <p>Results</p> <p>We propose a method of predicting interaction sites in proteins with unknown function by using both of PPI networks and protein structures. For a protein with unknown function as a target, several clusters are extracted from the neighboring proteins based on their structural similarity. Then, interaction sites are predicted by extracting similar sites from the group of a protein cluster and the target protein. Moreover, the proposed method can improve the prediction accuracy by introducing repetitive prediction process.</p> <p>Conclusions</p> <p>The proposed method has been applied to small scale dataset, then the effectiveness of the method has been confirmed. The challenge will now be to apply the method to large-scale datasets.</p
MultiRTA: A simple yet reliable method for predicting peptide binding affinities for multiple class II MHC allotypes
abstract: Background
The binding of peptide fragments of antigens to class II MHC is a crucial step in initiating a helper T cell immune response. The identification of such peptide epitopes has potential applications in vaccine design and in better understanding autoimmune diseases and allergies. However, comprehensive experimental determination of peptide-MHC binding affinities is infeasible due to MHC diversity and the large number of possible peptide sequences. Computational methods trained on the limited experimental binding data can address this challenge. We present the MultiRTA method, an extension of our previous single-type RTA prediction method, which allows the prediction of peptide binding affinities for multiple MHC allotypes not used to train the model. Thus predictions can be made for many MHC allotypes for which experimental binding data is unavailable.
Results
We fit MultiRTA models for both HLA-DR and HLA-DP using large experimental binding data sets. The performance in predicting binding affinities for novel MHC allotypes, not in the training set, was tested in two different ways. First, we performed leave-one-allele-out cross-validation, in which predictions are made for one allotype using a model fit to binding data for the remaining MHC allotypes. Comparison of the HLA-DR results with those of two other prediction methods applied to the same data sets showed that MultiRTA achieved performance comparable to NetMHCIIpan and better than the earlier TEPITOPE method. We also directly tested model transferability by making leave-one-allele-out predictions for additional experimentally characterized sets of overlapping peptide epitopes binding to multiple MHC allotypes. In addition, we determined the applicability of prediction methods like MultiRTA to other MHC allotypes by examining the degree of MHC variation accounted for in the training set. An examination of predictions for the promiscuous binding CLIP peptide revealed variations in binding affinity among alleles as well as potentially distinct binding registers for HLA-DR and HLA-DP. Finally, we analyzed the optimal MultiRTA parameters to discover the most important peptide residues for promiscuous and allele-specific binding to HLA-DR and HLA-DP allotypes.
Conclusions
The MultiRTA method yields competitive performance but with a significantly simpler and physically interpretable model compared with previous prediction methods. A MultiRTA prediction webserver is available at http://bordnerlab.org/MultiRTA.The electronic version of this article is the complete one and can be found online at: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-11-48
PREDIVAC: CD4+T-cell epitope prediction for vaccine design that covers 95% of HLA class II DR protein diversity
Background: CD4+ T-cell epitopes play a crucial role in eliciting vigorous protective immune responses during peptide (epitope)-based vaccination. The prediction of these epitopes focuses on the peptide binding process by MHC class II proteins. The ability to account for MHC class II polymorphism is critical for epitope-based vaccine design tools, as different allelic variants can have different peptide repertoires. In addition, the specificity of CD4+ T-cells is often directed to a very limited set of immunodominant peptides in pathogen proteins. The ability to predict what epitopes are most likely to dominate an immune response remains a challenge
A mathematical and computational review of Hartree-Fock SCF methods in Quantum Chemistry
We present here a review of the fundamental topics of Hartree-Fock theory in
Quantum Chemistry. From the molecular Hamiltonian, using and discussing the
Born-Oppenheimer approximation, we arrive to the Hartree and Hartree-Fock
equations for the electronic problem. Special emphasis is placed in the most
relevant mathematical aspects of the theoretical derivation of the final
equations, as well as in the results regarding the existence and uniqueness of
their solutions. All Hartree-Fock versions with different spin restrictions are
systematically extracted from the general case, thus providing a unifying
framework. Then, the discretization of the one-electron orbitals space is
reviewed and the Roothaan-Hall formalism introduced. This leads to a exposition
of the basic underlying concepts related to the construction and selection of
Gaussian basis sets, focusing in algorithmic efficiency issues. Finally, we
close the review with a section in which the most relevant modern developments
(specially those related to the design of linear-scaling methods) are commented
and linked to the issues discussed. The whole work is intentionally
introductory and rather self-contained, so that it may be useful for non
experts that aim to use quantum chemical methods in interdisciplinary
applications. Moreover, much material that is found scattered in the literature
has been put together here to facilitate comprehension and to serve as a handy
reference.Comment: 64 pages, 3 figures, tMPH2e.cls style file, doublesp, mathbbol and
subeqn package
- …